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ABSTRACT 

Low-frequency radio observations of neutral hydrogen during and before the epoch 
of cosmic reionization will provide 1000 quasi-independent source planes, each 
of precisely known redshift, if a resolution of ~ 1 arcminutes or better can be at- 
tained. These planes can be used to reconstruct the projected mass distribution of 
foreground material. Structure in these source planes is linear and gaussian at high 
redshift (30 < z < 300) but is nonlinear and nongaussian during re ionization. At 
both epochs, significant power is expected down to sub-arcsecond scales. We demon- 
strate that this structure can, in principle, be used to make mass images with a formal 
signal-to-noise per pixel exceeding 10, even for pixels as small as an arc-second. With 
an ideal telescope, both resolution and signal-to-noise can exceed those of even the 
most optimistic idealized mass maps from galaxy lensing by more than an order of 
magnitude. Individual dark halos similar in mass to that of the Milky Way could be 
imaged with high signal-to-noise out to z 10. Even with a much less ambitious tele- 
scope, a wide- area survey of 21 cm lensing would provide very sensitive constraints on 
cosmological parameters, in particular on dark energy. These are up to 20 times tighter 
than the constraints obtainable from comparably sized, very deep surveys of galaxy 
lensing, although the best constraints come from combining data of the two types. 
Any radio telescope capable of mapping the 21cm brightness temperature with good 
frequency resolution (~ 0.05 MHz) over a band of width ^ 10 MHz should be able to 
make mass maps of high quality. The planned Square Kilometer Array (SKA) may be 
able to map the mass with moderate signal-to-noise down to arcminute scales, depend- 
ing on the reionization history of the universe and the ability to subtract foreground 
sources. 



1 INTRODUCTION 

Dark matter appears to be the dominant component of all structures larger than individual galaxies. In the standard paradigm 
its gravitational effects drive the linear growth and the subsequent nonlinear collapse of the fluctuations detected at z ~ 1000 
in the cosmic microwave background (CMB). Our inability to "see" the dark matter, and so to image its distribution, has 
prevented a definitive observational verification of this paradigm. Simulations of structure formation predict all galaxies and 
galaxy clusters to sit within extended dark halos with regular and well-specified structural properties, but it has proved 
difficult to test these predictions convincingly. As first demonstrated by Kaiser & Squires (1993) the distortion of the images 
of distant objects caused by gravitational lensing can be used to reconstruct an image of the foreground mass distribution. 
All successful applications so far have used distant galaxies as the sources. The resolution and signal-to-noise of the resulting 
maps are fundamentally limited by the abundance and intrinsic ellipticity of these sources. Even with deep satellite data the 
effective density of usable galaxies does not exceed about 100 per sq.arcmin. For a map with 1 arcmin pixels this corresponds 
to a signal-to-noise per pixel (ratio of rms expected physical fluctuation to rms noise fluctuation) of about 0.75; for 10 arcmin 
pixels this ratio is about 2.5. As a result, only the centers of the most massive galaxy clusters can be detected at high signal- 
to-noise in galaxy-based mass maps. In this paper we show that much higher resolution and effective signal-to-noise can, in 
principle, be achieved by using high redshift neutral hydrogen as the source, rather than galaxies. 

There has been a great deal of interest in the possibility of observing the hyperflne transition of hydrogen (the 21 cm 
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line) from the intergalactic (or pregalactic) medium at high redshift (see Furlanetto, Oh & Briggs 2006, for an extensive 
review). There are two essentially disjoint epochs from which 21 cm radiation should be observable. At a redshift of z ~ 300 
neutral hydrogen (HI) became thermally decoupled from the CMB. The gas kinetic temperature then fell below the CMB 
temperature Tr due to their different adiabatic cooling laws. For a while the spin temperature Ts remained coupled to the 
kinetic temperature by atomic collisions, but at z ~ 30 the collision rate became so low that the spin temperature decoupled 
from the kinetic temperature and returned to equilibrium with the CMB. During the period 30 < 2 < 300, Ts was below Tr 
and there was a net absorption of CMB photons through the 21 cm line. The observable quantity is the brightness temperature 
Tb — {Ts — T,.) (1 — e^) ~ {Ts — Tr)T which depends on the optical depth, r, which is in turn proportional to the density of HI. A 
map of Ti, on the sky and in frequency would thus be a three dimensional map of the HI density, which is directly proportional 
to the mass density at these redshifts. The physics during this epoch of 21 cm absorption is simple, and predictions within 
the standard cosmogony are straightforward and robust. 

The second epoch with observable 21 cm effects is considerably less well characterized. It is known that almost all 
intergalactic hydrogen at z < 6.5 is ionized. It is believed that radiation from the first generation of stars and/or quasars 
caused this reionization between z ~ 6.5 and z ~ 30. The latest CMB constraints give 8.5 < Zreion < 22 at 68% confidence 
(Spergel et al. 2006). A variety of mechanisms will transfer energy from X-ray and/or Lyman-a radiation to the HI gas during 
reionization, thereby raising Ts above the CMB temperature and making the 21 cm line visible in emission. After reionization 
is complete, too little HI is left to be observable. The mean free path for X-rays through the neutral IGM at 2 < 15 can 
exceed the Hubble length, so the spin temperature for much of the HI could have been raised uniformly before significant 
reionization occurred. Lyman-continuum radiation is expected to produce ionized bubbles that expand until they overlap. 
Reionization finally completes as the last interbubble clumps are evaporated. During this period, Ly-a radiation passes freely 
through ionized regions but is resonantly scattered in neutral regions, thereby raising their spin temperature and producing 21 
cm emission. How rapid and inhomogeneous this process was is highly uncertain and is likely to remain so until it is directly 
measured. It is also possible that shock heating of HI gas during the collapse of pregalactic objects could raise Ts enough for 
21 cm emission to be visible before reionization begins (Kuhlen, Madau & Montgomery 2006). 

Gravitational lensing distorts our image of the 21 cm emission and absorption by moving the angular positions of points 
on the sky while keeping the associated surface brightness (and thus brightness temperature) unchanged. For this reason a 
smooth background radiation field is unaffected by lensing. The observed map of brightness temperature thus reflects both the 
intrinsic structure of the fluctuations and the lensing distortions. To separate the two, we use the fact that in a given direction 
the intrinsic structure of maps at sufficiently separated frequencies (hence redshifts) will be statistically uncorrelated, while 
the foreground lensing distribution will be the same. Below we show how the maps can be combined so as to average out 
the intrinsic temperature fluctuations while preserving the lensing signal. In essence, the gradients of brightness temperature 
maps at a set of sufficiently well-spaced frequencies are independently and isotropically distributed in the absence of lensing, 
but display a coherence which is a direct measure of the lensing-induced shear when the foreground mass distribution is taken 
into account. 

Gravitational lensing of pregalactic 21 cm signals has previously been considered by several authors. In particular, Zahn 
& Zaldarriaga (2006) extended to 3-dimensions (angle on the sky -I- redshift of source) the techniques developed by Hu (2001) 
for detecting lensing in the CMB, and they applied them to high-redshift 21 cm emission. In retrospect we find that the 
Fourier-space version of the method presented here is related to their method (see Appendices [B] and [C] for details) and that 
our method is related to one developed by Seljak & Zaldarriaga (1999) for detecting lensing in the CMB. Cooray (2004) had 
already discussed applying the original 2-dimensional Hu (2001) method to the 21 cm absorption epoch, but this misses the 
main advantage offered by the radio technique, namely the large number of available quasi-independent source planes. Finally, 
Pen (2004) discussed measuring gravitational lensing effects in the 21 cm emission by looking for anisotropic effects on the 
second order statistics of the brightness fluctuations. This does not estimate the gravitational shear directly by comparing 
maps at different frequencies in the same direction, and so is much less sensitive than the approaches suggested here and by 
Zahn & Zaldarriaga (2006). 

The 21 cm emission/absorption has two major advantages over the CMB as a background source for lensing studies. Since 
lensing conserves surface brightness, it can only redistribute structure that already exists in the source. The CMB has very 
little structure on the angular scales where lensing is significant ( ^ 1 arcmin) so that lensing effects are very weak. The second 
advantage is that the CMB provides only one temperature field on the sky while the 21 cm emission/absorption provides 
many, all of which are lensed by the same foreground mass distribution. Although the CMB comes from higher redshift, this 
is a relatively minor advantage since most of the structure detected by lensing is at much smaller redshift than either source. 

Our paper is organized as follows. In section[2]an estimator for the gravitational shear is derived and in section[3]the noise 
in that estimator is discussed and quantified using a particular model for correlations in the 21 cm brightness temperature. 
The expected lensing signal and the size of objects that could be detected are calculated in ^ The prospects for measuring 
cosmological parameters with 21 cm lensing are discussed in section [5l The observational prospects given currently planned 
telescope designs are discussed in ^ In the appendices several technical issues are addressed and alternative methods for 
measuring the lensing signal are described. 
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2 AN ESTIMATOR FOR THE GRAVITATIONAL SHEAR 

The observed deviation in the brightness temperature of the 21 cm emission at a redshift z (or equivalently frequency v) and 
a point on the sky, 6 will be denoted T{9, v). We seek to construct a statistic from this temperature that, when summed over 
frequency bands, u, preserves the lensing signal while smoothing out the fluctuations in T(6,v). A statistic will have these 
properties if it has the same properties when averaged over an ensemble of temperature fields at a fixed u while keeping the 
lensing contribution fixed. All statistics that are first-order in T{0, v) vanish with this averaging because of isotropy. 

We will now show that it is possible to isolate the lensing contribution in the second order statistics of the gradient 
of the temperature field, 'VT(0,u). The small angle, or "flat sky", approximation will be used throughout this paper and is 
well justified for the angular scales that are considered. The observed temperature at a point on the sky, 9, is the source 
temperature at ^' = ^ + a{8, v) plus noise, where 9' is the position on the source plane (what the position would be in the 
absence of lensing) and q.{9^v) is the position shift caused by lensing (hereafter the deflection). Thus the observed gradient 
of the temperature will be 

VkT{9,y) = {5k^ + ak^{9,u))V,T{6',u) + Nk{9,v) , aki{e,u) = ^"I'-f'"^ , (1) 

where T(9, v) is the real, unlensed brightness temperature and N{9, v) is the noise in the measured gradient. Repeated indices 
are summed over. The square of the magnitude of the observed gradient will be 

\VT{0t = \VT{9')f + {2aij{9) + aik{9)ajk{0)) V^{9')V^T{9') (2) 
+2 {dij + aij{e)) Ni{9)V'jT{e') + \N{9)f. 

where the i/'s have been left out for brevity. 

The source emission, the deflection and the noise will all be statistically independent so we can consider them separately. 

Averaging over the source gives 

{V[Tie',i.)} = 0, (3) 

{V^{e',t.)V,Tie',u)) = (4) 

where this defines (y%j(y) and &ij is the Kronecker delta. 

The distortion matrix can be decomposed into quantities that are commonly used in lensing, the convergence /t, the 
shear 7 and a rotation parameter /3, 

a=i " + ''1 72-/3 y 



(6) 



72 + P K - 71 

Using this decomposition we find aik{9, u)ajk{9, v) to correspond to the matrix 

K^+7^+/3'+2(7i«-72/3) 2(72K + 7i/3) 

2(72K + 7i/3) «;2_^72+/32 -2(7iK-72/3) 



The rotation term (3 comes from coupling between different lens planes and is second order in the surface density. It is expected 
to be very small in nearly all cases so we will neglect it in what follows, although its inclusion would be straightforward. Because 
of isotropy and the requirement that the usual angular size distance be correct on average, we have [aij{9,v)\n = where 
[...]n denotes an average over direction on the sky, and 

al{v) = [^\v)]n = , (7) 

where 7^ = 7i + 72- The second equality follows from the deflection field being a potential field (i.e. assuming /3 = 0). 
We now construct three second-order quantities from the observed temperature gradient, 

ri{9,iy) = ^{V^T{9,i^)ViT{9,v)-V2T{9,v)V2T{9,v)), (8) 

V2{9,u) = ViT{e,v)V2T{9,v), (9) 
and 

T^ily) = i|vr(e>)f, (10) 

where the indices on the gradient symbols refer to the two axes of the chosen orthogonal coordinate system. For a given 
direction (and hence deflection field) the averages of these are 

(ri(e>)> = a^{yhi{9,y){l + K{9,y)) + U{Ni{'^)Ni{y))-{N2{y)N2{y))) (11) 
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(r2(e>)) = a|M72(^,i^) + +{iVi(i.)iV2M) (12) 

{Ts{e,u)) = la^(t,)(l + 2K{e,u) + K(e,uf+^{9,uf) + ^a%{u), (13) 

where o-ff{i') is the average of |iV(S)p over random realizations of the noise. If the noise is isotropic it will drop out of both 
(|lip and mj. This can be seen by expressing the noise vector in terms of its magnitude and polar angle and then requiring 
that the direction be random. However, in general the noise may not be isotropic so we retain these terms. To lowest order 
the first terms in the averages (|lip and (|12p are proportional to the gravitational shear and (|13p is related to the convergence. 

The lensing signal from a single redshift slice will be dominated by noise so we wish to add up frequency channels to 
reduce the noise. The convergence and shear are slowly varying functions of v at the high redshifts we are considering, so for 
now we will assume 'y{9, u) to be independent of within the frequency band being used. This suggests estimating the shear 
at a point on the sky through 

Ud) = ^ {r.{o, v) - v)]^] (14) 

where the sum is over frequency channels. The weights, oJi/, are normahzed so that the mean values are 

= 11,20) (1 + K{e)) ~ 7,(e), (15) 

and 

(73(e)) = K{e) + 1(^(^)^+7(0)^ - 2ol) ~ ^{0). (16) 

The weights will be determined in the next section. Except along exceptional lines of sight (through the very centers of galaxies 
and galaxy clusters) k{9) is much smaller than one. As we show explicitly below, the variance is thus small, and (|14p in 
effect provides an unbiased map of 7(0) all the way back to the beginning of structure formation. We will sometimes refer to 
7i(0) as the shear estimators even though the 3rd component is an estimator for the convergence, and 73(61) will sometimes 
be written as k{9). 



3 NOISE LEVELS 

3.1 Instrumental, foreground and irreducible noise 

There will be a number of sources of noise in the estimators (I14|) . In particular, there will be noise from the instrumentation, 
from terrestrial interference, and from incomplete subtraction of galactic and extragalactic foreground emission. This noise is 
encapsulated in the N{9, v) vector field. We will refer to these sources of noise collectively as foreground noise. It is expected 
that foreground emission will be removed to high accuracy by using the fact that it varies slowly with frequency, whereas the 
21 cm emission/absorption signal (and particularly the angular gradient of this signal) decorrelates for even small separations 
along the line-of-sight (see Zaldarriaga, Furlanetto & Hernquist 2004; Santos, Cooray, & Knox 2005). The removal process 
could, however, leave noise with correlations in both frequency and position on the sky. For currently planned generations 
of instruments, this residual is expected to be as small as or smaller than the purely instrumental or thermal noise. The 
lensing signal is also coherent in frequency, but foreground subtraction will not effect it because lensing is multiplicative while 
while the foregrounds are additive, see equation ([l]). Lensing does not cause correlations between frequency channels, it causes 
spatial correlations within a frequency channel that are the same as in the other channels. 

In addition to foreground noise there is noise from the randomness of the \/T{9, i/) field itself. Clearly, this cannot be 
reduced by any improvement in technology or foreground subtraction, so we will refer to it as the irreducible noise. It depends 
only on the intrinsic correlations in the 21 cm signals and on the range of frequency, or redshift, over which the signals are 
mapped. We will find that for any telescope which is able to map the 21 cm signals, the total noise in the shear estimate will 
automatically be near the irreducible value. For this reason it is both a lower limit and a good benchmark. 

We must also differentiate between the noise per pixel and the noise in the average 7(0) over a patch of sky which is 
larger than the pixel size. By pixel we mean the smallest resolvable region of the sky as set by the telescope. If the angular 
correlations in the noise drop off more rapidly than the correlations in the shear, then the signal-to-noise ratio will be maximal 
on an angular scale that is larger than the pixel size. We will refer to a region of sky over which "f{9) is averaged as a patch 
in the shear map. A patch could be a square region, a circular aperture, a gaussian smoothing window or any other localized 
window. 

The variances in the magnitude of our shear estimators, (|14|) . are given by 
cr?(5e) = J j d^9d^9'W{9-,5Q)W{9'-5Q)^'^ uj,Lj,,^{V^{v,9)T,{v',9')) , (17) 
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a^Se) = aUSQ) (18) 



j J d^ed^9'w{9;5e)w{9';se)J2J2 [(r3(;^,e"')r3(i.',e')) -(r3M)(r3(i^'))] , 



(19) 



where W{6; 50) is the window function defining the patch (normalized to one when integrated over 8) and 5Q is its character- 
istic angular scale. The noise in the magnitude of the shear per pixel (i.e. in the original unsmoothed detection) is cr^(O) while 
the noise in the isotropic estimator is a^^{5Q). Note that the tildes are used to differentiate between noise in the estimator, 
cr^ and the variance in the signal, cr-y. With some assumptions the correlation functions can be simplified 

2 

^(r4i/,0)r.(j.',0')) = 2 (r2(i.,e"')r2(i/',e"*')) (20) 

i = l 

= 2{ViT{iy,e)V2T{iy,e)ViT{iy',6')V2T{iy',e')) (21) 
= 2{\/iT{,y,,e)\/,T{,y',e'))\ (22) 

and 

{r,iu,e)rs{u',0')) - {r,iu)) {r,{u')) = {\/,T{u,e)\/,Tiu',0'))\ (23) 

In (|20p we used the fact that rotational invariance requires that the noise in both components of 7(6^) be the same, so we 
choose to find the variance in the simpler T2{i') and double it to account for the other component. To get from (|2ip to H22p 
we have assumed that each component of VT(i/) is normally distributed so that the fourth moment can be reduced to second 
moments. In addition, we assume that the temperature field is isotropic so that there is no cross-correlation between the 
components. The same assumptions are used in expression ((23}. 

Replacing the observed gradient in (|22p with the true temperature gradient plus noise gives the result 

1 



ai{5e) = 2ai(5e) (24) 



2 '^i^'^v'A{u,iy',50) , (25) 



where 



(26) 

Wie- SQ) = J d^e' w{9' + e/2; se)wie' - e/2; sq), (27) 

and 

2 

^■^{u,u',e^\e' -e"\) = ^{\/,Tiu,e')\/,T{iy',e")) implying a^(;.) =Cv(t',t',0), (28) 

i = l 

The correlation function ^n{i^,v' ,0) is similarly defined. The pixel and frequency response functions are included in T{8). 

The optimal weights, lJv, can be calculated numerically, but a very good analytic approximation can be found by assuming 
that they vary slowly over the frequency range in which T{iy) is correlated {A{h', v') ~ Aiy, v)). In this case ui^,' ~ uj^ in (|25p . 
Minimizing this subject to the constraint (llSp gives the weights 



y (29) 



Substituting this back into (|25p gives the noise 



-'(^e) = ± . (30) 



If the noise in the brightness temperature map is small compared to the fluctuations in the temperature itself, S,N{i^,d) < 
^v{i^,d), which is a minimal requirement for mapping the brightness temperature, then the foreground noise will drop out of 
(1261) and the irreducible noise limit will be reached. To approach this noise level it is not necessary to eliminate all foreground 
noise. Thus a telescope designed to map the brightness temperature will naturally achieve a noise level in 7(6) that is close 
to the irreducible value. The correlations in frequency might be set by the bandwidth or by the intrinsic correlations in the 
brightness temperature. 
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It is often convenient to express the lensing noise pop in terms of the (cross-)power spectra of the brightness temperature 
cf {u, v') and the noise in the temperature {v, v'). This can be done by Fourier transforming the temperature and gives 



and 



'^vM = j \tVC,{v,v). (32) 

For a further discussion of calculating things in Fourier-space see appendices [B] and [C] 

To determine the possible capabilities of 21 cm lensing experiments we now investigate the optimal case in which the 
irreducible limit is reached with a bandwidth that is much smaller than the intrinsic correlation length of the temperature. In 
the hmit of infinitely narrow bandwidths the sums in eq. (|30|) can be converted into integrals. The function Aiy, v' , 9) defines 
a volume in frequency and angle within which the structure or noise is too strongly correlated to contribute "independent" 
information to the shear measurement. A very useful approximation to this volume can be found by calculating its characteristic 
length in frequency at 6 — and its characteristic angular area at v' — v. For the temperature gradient alone these are 

and 

,34, 

Analogous correlation lengths can be defined for the noise term and for the cross-term. Note that these correlation lengths 
are defined with the correlation function squared, temperature to the fourth power, which makes them significantly smaller 
than the usual correlation lengths defined with the first power of the correlation function. 

When the patch size is near Af2v('^) there will be only a few quasi-independent areas per patch. To account for this it 
is convenient to define the quantity 

/ \ 2 

,2 a i i'jiv^v, 



Uv{v;5e) = J d'e [^ ^ ^V(J) j wie-se). (35) 

This quantity is essentially the area of a correlated region divided by the area of the patch. Two limiting cases are instructive. 
For a very small patch A/'v(i';56) — > 1 and for a patch much larger than the intrinsic correlation length A/'v(i';56) 
VK(0; (50) Af2v (i^) {~ AQ,^ (u) / {A-kSO'^) for a Gaussian patch). One would like the data to be collected in frequency channels 
with a width smaller than Ai'^{iy), otherwise the irreducible noise will be increased. Instrumental design or foreground noise 
actually result in there being an optimal bandwidth that is near Au\/{v) as will be shown in section [6] 
Using the above definitions in (|30|l a simple approximation for the irreducible noise is found, 



When the patch size is very close to the pixel size the complete integrals in (|26|l must be carried out to obtain an accurate 
result, but for the purposes of this section this is not necessary. The limit for small 5Q is the noise per pixel. It is easy to see 
from (|36p and (|35|) that the square of the irreducible noise is essentially one over twice the number of correlated volumes in 
a patch. 

The above estimates assume that a^{u), the variance in the intrinsic temperature within a frequency channel, can be 
measured exactly so that the estimators 7^ can be normalized properly. This is normally a good approximation as we now 
show. The variance in the gradient can be found by averaging over position on the sky in the entire surveyed region. Using 
((2| and dropping all terms higher than second order in k and 7 results in 

2 _ 



[(|vr(z.)0'] J (37) 

cj^Lo,, / d''e{^v{u,u',ef {l + 2al + 2^4e)) (38) 



= 2 

+2^v{u, u', e)^N{v, v\ 9) (1 + US)) (39) 

+^^i,,u',e)'} + ^ [ (40) 

where the area of the surveyed region is Q. It has been assumed that aff is determined by independent means to an accuracy 
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Figure 1. The frequency correlation length, Ai'\r{z), defined in equation l|33| l as a function of redshift. The power spectrum of 21 cm 
emission is taken to be the same as that of the dark matter, including linear velocity distortions and nonlinear structure formation. The 
dotted-dashed curve is for a gaussian pixel of radius 59 = 5 arcmin, the dotted curve is for 1 arcmin, the solid curve is for 0.5 arcmin, 
the dot-dot-dot-dash curve is for 0.1 arcmin, and the dashed curve is for 0.05 arcmin (3 arcs). In the S6 = 0.05 arcmin case the decrease 
in the correlation length at small redshifts is caused by nonlinear structure formation. This efi'ect is present in the other cases but to a 
lesser extent. 
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Figure 2. The expected irreducible noise in the shear measurement per pixel. This is a plot of expression 

(|36j with 50 = and assummg 

a CDM dark matter power spectrum for the 21 cm brightness temperature. The dot-dash curves are for a pixel radius of 59 = 5 arcmin, 
the dotted curves are for 1 arcmin, the solid curves are for 0.5 arcmin and the dashed curves are for 0.05 arcmin (3 arcs). The upper 
limit of the redshift range used in the measurement is the abscissa, Z2 ■ For each pixel size the five curves are for different lower redshift 
limits. They are from left to right (or down to up) zi = 6.5, 12, 22, 40 and 71. 

much better than the above. The correlations in the lensing convergence are relatively small {uK,^K{d) ^ 1) so the terms 
containing them on lines (|38p and (|39|l can be safely ignored. The last term in (|40|l expresses the uncertainty in the mismatch 
between the average k (or 7) over the survey region and the true average. 

By comparing l|38p - (|40( l with equation (|26ll one can see that aj^ ~ ^(Af2vo"?(0) + A^IkUk) where AQk is the area of sky 
over which the foreground convergence is correlated. (Note that Afi^ is defined with one power of £,k{9) instead of two like 
Af2v)- Both these correlated areas could effectively be as small as the pixel if sparse pointings are used for normalizing "f{9). 
If a survey covers just a few independent regions and is capable of mapping the shear (so that ^ a^), then the noise in the 
normalization of the shear map will be small, and aj{SQ), as obtained above, can be taken as the noise in the shear estimate. 

3.2 Correlations in the 21 cm emission 

The irreducible noise in the shear map depends critically on the number of statistically independent regions of 21 cm emis- 
sion/absorption along a single line-of-sight. At the redshifts where the 21 cm brightness temperature is significant the density 
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Figure 3. The angular correlation function at fixed frequency for different pixel sizes SB. The pixel radii are 5, 1, 0.5 and 0.05 arcmin. 
Larger pixels have deeper minima. These are for a source redshift of z = 18.9, but all except the 56 = 0.05 arcmin case are very nearly 
independent of redshift. The dashed curve is the result for the completely pixel-dominated Poisson process case. 



of the universe was dominated by ordinary matter so the comoving length between two redshifts is well approximated by the 
flat universe formula 



Between redshift 10 and 100, for example, Ico ~ 2200/i~^ Mpc for Qm = 0.3, or 96.5/i~^ Mpc in proper distance. Roughly 
speaking, the correlation length ([33} is ~ 0.1 — 1 Mpc (comoving) for a pixel size of 0.5 arcmin in radius or smaller so we 
expect of order 1,000 independent samples between these redshifts. A more detailed calculation must take into account the 
precise form of the correlations in the brightness temperature. 

The irreducible noise is independent of the normalization of the correlation function ^v(j', i^' , 8) and thus will depend only 
on the shape of the 3-dimensional correlation function or power spectrum. During the early epoch of 21 cm absorption the 
brightness temperature will be correlated in the same way as the dark matter (Loeb & Zaldarriaga 2004). During reionization 
the correlations could be very different. One expectation is that "bubbles" of ionized gas will form and expand until they 
merge. The size of the bubbles depends on the abundance and spatial distribution of sources of ionizing radiation; AGN 
produce larger bubbles and stars smaller bubbles. These bubbles may or may not be smaller than the pixel - a 1 arcmin pixel 
has a comoving width of 1.9 Mpc at z = 10. In what follows we make the assumption that the brightness temperature 
is proportional to the dark matter density even during reionization. We consider this conservative, because modifications to 
the power spectrum during reionization are more likely to shorten the correlation length (and so to reduce the noise) than to 
increase it. The contribution of ionized bubbles will increase the correlations on scales larger than the characteristic bubble 
size and suppress them somewhat on scales smaller. There have been a number of attempts to model the fiuctuations in the 
brightness temperature during reionization (Furlanetto, Zaldarriaga & Hernquist 2004; Wyithe & Loeb 2004). We tried the 
simple model of Santos et al. (2003) and find that it produces very little difference in the irreducible noise for S6 — 0.5 arcmin 
because the bubbles are significantly smaller than the pixel sizes. However, in the absence of either a complete theory of 
reionization or direct observations, the form of the temperature correlations remains a significant source of uncertainty in 
what follows, especially for small pixel sizes. 

The brightness temperature in direction 6 and at frequency i/ is given by 



where qi,{r) is the response function of the telescope expressed as a function of distance instead of frequency and r(i/) is the 
comoving distance to the redshift from which the 21 cm line is observed at frequency v. Since peculiar velocities will change 
the observed frequencies of the 21 cm line, r{v) is not actually the radial distance, but rather the redshift expressed as a 
distance. 

Using this we can find the correlation function between the gradient of the temperature at different redshifts. This can 
be done in spherical coordinates, but it comes out much more simply in the small angle approximation. The bandwidth will 
initially be treated as infinitely narrow, q,^{r) — S{r{i') — r) (see Appendix iBl and [Cl for a treatment of finite bandwidths). The 




(41) 




(42) 



High-resolution imaging of the cosmic mass distribution from gravitational lensing of pregalactic HI 



result expressed as an integral over Fourier-space is 
^v{u,u',6) = {VT{e,u)-VT{e',u')) 

poo _jj 
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where Ar{iy,iy') = r(i/) — r{v'), r{v,v') = {r{v) + r(i/')) /2 and P2i{k,V) is the 3-dimensional power spectrum of the 21 cm 
brightness temperature. It has been assumed that the power spectrum, P2i{k, u), does not change significantly over the range 
in V where there are significant correlations. The linear redshift distortions (Kaiser 1987) are responsible for the /3 term. The 
parameter /3 is given by /3 = S7m,(z)°'^/6(z)^ to a very good approximation, where h{z) is the bias between the matter and the 
T2\ fluctuations and Q,m[z) is the density of matter in units of the critical density at that time. Here we have assumed that 
P2i{k) = 6^Pmattcr(fc) and is thus proportional to the matter power spectrum. In the calculations that follow we take 6 = 1, 
as expected at least during the early epoch of 21 cm absorption. 

With this result and with an assumed pixel profile the frequency correlation length (|33|) . the angular correlation area (|34|) 
and the irreducible noise (|36|l can all be calculated. The nonlinear evolution of the power spectrum is of some significance 
for the smaller pixel widths considered here. To account for this we use the Peacock & Dodds (1996) method to convert the 
linear power spectrum to a nonlinear one. 

Figure [1] shows Ai^v(^) as a function of redshift for a circular gaussian pixel with various radii 59. The decrease in Avtj{z) 
with increasing redshift is largely the result of a fixed comoving distance corresponding to a smaller frequency interval at 
higher redshift. The correlation length also increases with increasing pixel size, but it is always less then 0.4 MHz, even when 
the pixel is 5 arcmin in radius. 

The irreducible noise per pixel, o-|(0), is shown in figure [2] for a few pixel sizes and ranges in redshift. It can be seen that 
smaller pixel sizes give 5ma//er irreducible noise per pixel. It is not yet clear over what range in z the 21 cm emission/absorption 
will be detectable. This depends on the history of reionization, on the subtraction of foregrounds and on telescope sensitivity. 
If the reionization epoch lasts from z ~ 10 to 20 and this whole redshift range can be observed, then the expected irreducible 
noise is 2% for & 56 = 0.5 arcmin pixel and 0.6% for a 3 arcsec pixel. The early epoch of 21 cm absorption lasts from 
z ~ 30 — 300. If this whole range could be observed, then we expect o-^(O) = 1.7% and 0.6% for the same pixel sizes. It is 
possible that both epochs of emission/absorption will someday be observable, reducing the noise still further. 

The angular correlation function at fixed frequency is shown in figure |3] for several different pixel sizes. To a good 
approximation, the correlation function scales as 



(45) 



where /e is a constant. The somewhat awkward normalization is chosen so that 2-k J xf{x)dx — 1 and is unity to within 
a few % if the brightness temperature follows the CDM density field. We retain /e as a fudge factor which could differ 
significantly from one if brightness temperature is not distributed like mass. This approximate scaling is a result of the power 
spectrum being almost scale-free on the relevant scales. It is a very useful approximation with important consequences, because 
it means that the smaller the pixel, the lower the irreducible noise for a fixed area on the sky. The scaling can be understood 
by considering the limiting case where the temperature is a Poisson process with correlations only on scales much smaller 
than the pixel so that the pixelization dominates the observed correlations. In this case 
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and /j = 7r/4. This limiting case is also shown in figure [3] The angular correlation is very nearly frequency independent 
because comoving angular size distance is a slow function of redshift at these high redshifts, and because the power spectrum 
of temperature fiuctuations does not change shape during linear evolution. There is some dependence on u for the smallest 
pixel radius {58 = 0.05 arcmin) refiecting nonlinear structure formation effects on these small scales (~ 100 kpc) at the lower 
redshifts. The correlation area is a simple function of 59, to a very good approximation Af2v — 4/^59^. Note that we use the 
radius of the gaussian to characterize the pixel rather than its full width at half maximum (fwhm). 

Because of this simple scaling of correlated area with pixel size, a simple expression for the irreducible noise per patch 
can be found 
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Figure 4. The root mean square value of k smoothed with a gaussian window on the sky as a function of source redshift. The curves 
from top to bottom are for patch sizes of 0.05, 0.2, 0.5, 1, 2, 5 and 10 arcmin respectively. 
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Figure 5. The contribution to crj^ from different redshifts, ^ d in z ' '^^^ curves end at the source redshifts 2 = 1, 10 and 100. The 
dotted curves are for a patch of radius 0.05 arcmin, the solid curves for 0.5 arcmin and the dashed curves for 5 arcmin. 
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To connect the two asymptotes the formulae p5|) and (|36p must be used. The scaling as 5Q~ on large scales just reflects the 
fact that correlations in temperature gradient are negligible on scales significantly larger than the pixel. The prefactor might 
be different if brightness temperature turns out not to be distributed like dark matter density. If the brightness temperature 
correlations have strongly non-power-law behavior on the relevant scales, then will show some dependence on the pixel size. 
For example, if the temperature distribution were smooth on small scales, then making the pixel smaller would provide no 
further information and the noise would not continue to decrease with pixel size. As mentioned before, the correlation length 
might also be smaller during reionization, in which case (t^[5Q) might also be smaller. This effect must be minor, however, 
since the correlation length cannot be smaller than for the completely pixel-dominated Poisson case and, as figure |3] shows, 
this is only slightly smaller than that of our standard model. 

In Appendix [C] we present an alternative derivation of the noise in Fourier or visibility-space that agrees very well with 
the one given here. 
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Figure 6. The root mean square convergence and the irreducible noise in the convergence {a;i = a-.y/\/2) as a function of patch size. 
The solid curve is c^i^SQ^ for a source redshift of z — 100 and the dashed curve is the same for z — 10. The dotted curves are (T;^(50) 
for different pixel sizes — from top to bottom S8 = 5, 1, 0.5 and 0.05 arcmin with a gaussian pixel. The normalizations, (7^(0) are chosen 
to be representative, but will depend on the redshift range and the structure that exists in the 21 cm emission and absorption. Guided 
by figure [21 we have taken them to be 0.05, 0.03, 0.02 and 0.007 respectively. 



4 THE EXPECTED SIGNAL AND ITS CONNECTION TO THE DISTRIBUTION OF MATTER 
4.1 Signal-to- noise estimates 

We now need to determine whether there will be enough signal on the appropriate angular scales to produce a high fidelity 
map of the shear. This requires the noise to be significantly lower than "typical" values of the shear. We quantify the latter 
by calculating the root mean square value of the shear along random lines of sight. 

The distortion matrix introduced in section [2] can be written in terms of derivatives of the Newtonian potential, ^(a;), 
along the light path. To a good approximation the unperturbed light path can be used (the first Born approximation) which 
results in 



with 

_ Dir',0)D{r,r') 
D(r,0) 

where r is the radial coordinate distance, D{r,r') is the angular size distance between the two coordinate distances and 
W{6;SQ) is still the angular window on the sky. Sometimes the distances to the source redshift, to the lens redshift and 
between them will be abbreviated as Ds, Di and Dis, respectively. The coordinate vector perpendicular to the line-of-sight is 
x_i_. The lensing convergence is 

^{9,z)= ^H^n^ j;;^'^ dr' (l + z')g(r{z),r')Jd'x^w(j^ 



-e;Se]S{x^,r') 



(50) 



where S{x) is the fractional density fluctuation. 

To relate the variance in k to the power spectrum of matter fluctuations it is easiest to use the Fourier space Limber's 
equation (Kaiser 1992) and then to transform back to angular space. For a geometrically flat universe the result is 
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where E{z) — yJVlmi'i + z^ + f^A and Ps{k, z) is the 3D power spectrum of matter fluctuations at redshift and W{1\ 50) 
is the window in Fourier space. The flrst equality follows from the shear being a homogeneous potential field to first order. 
The window will be taken to be a gaussian to conform with our results in section 13.21 

Figure ID shows a^iz) as a function of source redshift for windows of different widths. The expected fluctuations in k are 
at the many percent level for redshifts between 10 and 300 (for a 1 arcmin pixel 4% to 6%, for a 3 arcsec pixel 7.5% to 11%). 
Reducing the pixel size can increase the signal substantially. By comparing this figure with figure [2] we see that for a pixel 
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Figure 7. The mass detection limits for NFW halos as a function of redshift. The dotted curves are Icr and the solid curves are 2(7 
detections of the tangential shear in a circle centered on the halo. The lowest set of four curves are for a pixel size of 0.05 arcmin, 
ux^ = 0.007 and source redshifts of 10 and 100. The middle set of four curves are for a pixel size of 0.5 arcmin and = 0.02. The convex 
features in the curves at high redshift are a result of the requirement that the circle be larger then the pixel size. The dashed curves are 
2(T estimates for galaxy lensing surveys with the density of galaxies being, top, 35 arcmin"^ (typical of a ambitious ground based survey 
such as LSST or the DUNE satellite) and 100 arcmin"^ (perhaps achievable over a small region with a satellite such as SNAP). 

size of 1 arcmin or smaller and with a moderate redshift range the irreducible noise per pixel is less than half the expected 
signal. Figure [S] shows which redshifts contribute most to a'^{z) for source redshifts of 1, 10 and 100. It can be seen that 
structures above z = 2 contribute significantly in both 21 cm cases, whereas structure around z ~ 0.5 dominates in the galaxy 
lensing case. If the shear could be measured accurately using signals from both epochs of 21 cm emission/absorption, one 
could expect to isolate the contribution from structure at z ~ 10, since this contributes significantly to the signal for source 
redshift 100. In these calculations the nonlinear power spectrum was modeled using the Peacock & Dodds (1996) method 
with a normalization of as = 0.75. Note that for the smaller pixels especially, the distribution of k is strongly nongaussian, 
and the variance plotted in fig. |4] is substantially larger than a typical fiuctuation because of the long tail to high n values 
(see Hilbert et al. (2007). 

Figure |6] shows a^^^{SO) and as (50) as functions of angular scale for observations averaged over patches larger than the 
pixel. The fluctuations in shear drop off relatively slowly with increasing angular scale while at scales much larger than the 
pixel size as (50) oc SQ~^ . As a result even if the noise per pixel is comparable to the shear can still be mapped with high 
signal-to-noise on scales larger than the pixel. With small noise per pixel, the surface density averaged over large scales can 
be measured with high precision. Note that the normalizations of as (50) in this plot depend on the redshift range over which 
the 21 cm emission and absorption are measured. We have chosen representative values as listed in the caption. 

4.2 Detection of individual objects 

We have shown that la fluctuations in the convergence could be detected with modest to good signal-to-noise (depending 
on pixel size) by a 21 cm experiment. Another interesting question is what kind of objects would be individually visible in 
a 21 cm shear map. To answer this question let us consider a circle of radius 9 on the sky centered on a collapsed clump or 
halo. The average tangential shear on this circle is given by 

where M{6) is the mass within the circle. The average tangential shear within a disk can be found from this. For halos with 
an NFW profile (Navarro, Frenk & White 1997) we find, as a function of virial mass and halo redshift, the radius where 
the signal-to-noise for the average tangential shear is maximized. The central density and the scale-size are set according to 
the NFW prescription. If the disk is smaller than the pixel, the tangential component of the shear will not be identifiable. 
Thus although these halos might cause a significant feature in the shear map, we will not consider a halo detected unless the 
signal-to-noise is above a 1 or 2-a threshold within a circle with a radius at least as large as the pixel size. The halo mass 
detection limit is plotted in figure [T] With a 3 arcsec pixel this threshold is below lO^'^ M0 almost all the way out to the 
redshift of the 21 cm emission/absorption and < 2 x 10^^ M0 below z = 1. This is smaller than the mass of the Milky Way 
halo today. Note that these are virial masses, not the mass enclosed within the circle. The latter is the directly detected mass 
and can be significantly smaller. 
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Figure 8. The fractional error in the convergence power spectrum for a full-sky survey due to irreducible noise and cosmic variance. 
The solid and dashed curves are for 21 cm lensing with sources at ^ = 100 and z = 10 respectively, assuming a 0.5 arcmin pixel radius 
and (T?(0) = 0.03. The two dot-dashed curves are for galaxy lensing surveys with median source redshift z = I and with 35 (upper) and 
100 (lower) galaxies per square arcmin. The dotted curve which is just visible in the lower right corner, but otherwise is covered by the 
solid curve is the cosmic variance limit. For a smaller area survey these curves scale with the fraction of sky covered like 1/ fsky for 
modes that are smaller than the surveyed region. 

Taking the average tangential shear over a disk is not the best method for detecting halos. One could do somewhat better 
by assuming a model for their radial profiles and deriving an optimal weighting function (Schneider 1996). Here, however, 
we restrict ourselves to the question of what objects would be clearly visible in a shear map without any further special 
processing. 

For comparison we calculate a similar mass limit for idealized future galaxy lensing surveys. In this case the noise in 
the average shear in a patch of radius O is tj^ = /(nQ'^ng) (half this for just the tangential component) where Ug is the 
angular number density of background galaxies and is the root-mean-squared intrinsic ellipticity of those galaxies; we use 
the standard estimate = 0.3. The shear strength depends on the redshift distribution of background galaxies with usable 
ellipticities. Here we model the redshift distribution as oc z^e"'^/^"' , where Zo is set by the desired median redshift. 
The shear (|52|) must then be averaged over the portion of this distribution that is at higher redshift than the lens plane. Halo 
detection limits calculated in this way are also shown in figure [T] 

A very deep space-based galaxy lensing survey might be competitive with a ~ 1 arcmin pixel 21 cm lensing survey for 
detecting halos at z < 1. The proposed satellite SNAF0 is expected to survey ~ 2% of the sky with an expected galaxy 
density of rig ~ 100 arcmin" and a median redshift z ~ 1.23. The DUNeH satellite proposes surveying ~ 50% of the sky with 
Ug ~ 35 arcmin" and a median redshift of z ~ 0.9. Several proposed ground based surveys - LSStI, PanSTARRfl VISTaH 
would cover comparable areas to DUNE at similar depth. These are the two cases shown in figure [T] Clearly higher redshifts 
will be accessible with 21 cm lensing. With a small pixel size, 21 cm lensing could detect all Milky Way mass halos in the 
universe! Based on the Sheth & Tormen (2002) halo mass function, for the same sky coverage ~ 600 times more objects could 
be identified by such a survey than in a space-based galaxy shear map with Tig = 100 arcmin"^ and ~ 3, 500 times more than 
in a ground-based galaxy shear survey with Ug — 30 arcmin"^. Mass maps of galaxy clusters could be made with arcsecond 
resolution and high signal-to-noise, instead of with arcminute resolution and relatively low S/N as is possible using galaxy 
lensing. Galaxy halo studies, which now require stacking thousands of galaxies to measure a single average shear profile, could 
be carried out on individual galaxies. 



5 ESTIMATING COSMOLOGICAL PARAMETERS FROM THE LENSING POWER SPECTRUM 

As we have shown, high-resolution, high signal-to-noise shear maps could be made using 21 cm lensing. These maps will contain 
a wealth of information which can be used not only to learn about structure formation, but also to estimate cosmological 
parameters. We will make a preliminary foray into this latter topic in order to compare the power of 21 cm lensing to that of 
galaxy lensing. A useful study of the the capability of planned galaxy lensing surveys for cosmological parameter estimation 

^ snap.lbl.gov 

2 www.dune-mission.not 

3 www.lsst.org 

"* pan-stars.ifa.hawaii.edu 
5 www.vista.ac.uk 
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has recently been published by Amara & Refregier (2006) and we will adopt their survey parameters in the following in order 
to facilitate comparison between the two techniques. 

Figures |4] and [5] show clearly that the strength of gravitational lensing depends on source redshift. This suggests that 
additional information may be extracted by comparing shear maps derived from sources at different redshifts - either multiple 
21 cm source planes, or multiple galaxy source planes, or a combination of the two. Such weak lensing tomography has already 
been proposed for galaxy lensing surveys as a method to measure the evolution of structure and thereby to constrain the 
nature of dark energy (Hu & Tegmark 1999; Hu 2002; Heavens 2003; Castro, Ifeavens & Kitching 2005). In this context 21 cm 
lensing has the potential advantages of superior signal-to-noise, higher source redshift and better angular resolution. On the 
other hand, most models of dark energy affect structure formation and the cosmic expansion rate primarily at z ^ 1 where 
galaxy lensing tomography is most sensitive. As we show below, a combination of galaxy and 21 cm lensing appears likely to 
constrain dark energy parameters most effectively. 

For the purposes of cosmological parameter estimation it is convenient to work in spherical harmonic or Fourier space. 
The cross-correlation between the harmonic modes of two shear maps, corresponding to source planes at redshifts zi and Z2, 
can be derived from equation (|49|l and is directly related to the power spectrum of density fluctuations 

{jr{l,z,)^,{£',z,)) I cos2(2e,) \ 

(72(^,2i)72(^',2j)) = sin^(2et) {k{1,z^)k{1',Zj)) (53) 

(7i(^,2i)72(^',^j)) V cos(2e,) sin(26lf) / 

with 

{ti{t,Zi)n{t' ,zj)) = {2-Kf5^{t-t')Pl\t) (54) 
and 
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where I and are the multipole indices in the two maps. Using (|53[) the shear (cross-)power spectrum is trivially converted 
into the convergence (cross-)power spectrum. 

The observed shear power spectrum of a lensing map contains a contribution from the irreducible noise, but this term is 
absent in the cross-correlation between maps for different source redshifts, since the noise fields in the two source planes are 
then independent. The power spectrum of the irreducible noise can be found from the analysis of section |3] 

(7W7(^')) = {2^fm^f')N.{l) (56) 
~ {2-Kf5\l-e.') ^-^= [ d^e ( (57) 



2{v2 - n)Au^'^ 

~ (27r)=^5^(£-0 4a?(0)/.y"d^e/(^)e-"* (58) 

where the function f{x) is defined by equation (|45p and in what follows it. An alternative approach to calculating this noise 
directly from visibility space is demonstrated in appendix [Cl 

The observed shear power spectrum including only the irreducible noise will be 

Ci'{£) = P:'{e)\Wi£)\* + N^{£) 5,, (59) 
~ P:'{£)e-''">''' +4a?{0)f,Se^fi£5e) 5,, . (60) 
~ P'^'{£)e-^^''^'^ + 7rcr?(0)56»' (1 + ) g-M'^V2 (resolution limited case) (61) 



As can be seen in figure |3] the angular correlation function ^v{d) has similar angular scale to the pixel. As a result f{£58) 
is close to unity for any £ ^ 1/56 and it decreases rapidly for larger £. We will restrict ourselves to modes larger than the 
pixel (i.e. £59 <^ 1) in which case both e'^^" * and f{£5e) will drop out of CIP{£). Line JSI} shows the resuh for the pixel 
dominated Poisson case discussed in section \3l2\ 

The shear power spectrum from galaxy lensing has the same form except there is no pixel. It is often assumed that the 
noise in this case is dominated by the intrinsic ellipticities of galaxies in which case the noise power spectrum is Nk{£) ~ o"?/wg 
(Kaiser 1992). In practice errors in the photometric redshifts of the source galaxies are often important, but here we will assume 
an ideal survey where these are not significant. 

So far no assumption of Gaussian statistics in the shear field has been made in this section. Although our estimator 7(6) is 
not Gaussian, we have shown that the correlation length of the irreducible noise should be close to the pixel size. The multipole 
moments for scales larger than the pixel size are then sums of many independent variables and, by the central limit theorem, 
are expected to be approximately normally distributed. The shear map itself will also have substantial non-gaussianity caused 
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by nonlinear structure, even for Gaussian initial density fluctuations. However, on scales larger than individual dark halos 
the shear map is expected to be close to Gaussian because of contributions from many independent structures along the long 
line-of-sight (Takada & Jain 2004). 

For a Gaussian shear map the likelihood function factorizes by mode, making the analysis much simpler. The Fisher 
matrix in this case is 

«max 

F,6 = i 2 (2£+l)/skytr[C-'C,„C-'C,6] , (62) 



(formula (|62ll in appendix |A]| where the indices a and b refer to parameters pa and pt and /sky is the fraction of the sky 
covered (Ifu 1999). The /sky factor can be interpreted as the result of limited resolution in visibility-space because of the finite 
size of the radio telescope's pixel. It is assumed here that the coverage of the u-v plane is complete between Imin and fmax 
down to the resolution of the telescope. The smallest scale mode, ^max, is chosen so that the Gaussian assumption remains 
approximately valid. The minimum variance unbiased estimator of then has statistical uncertainty a^{pa) = {F~^)aa, so 
this quantity can be used to indicate how well the parameter Pa can be constrained. 

Directly from (|62p one finds that the power spectrum of the fiuctuations in k can be determined to accuracy 
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(63) 



using only one epoch of 21 cm lensing. This formula holds on scales between that of the survey area, where windowing effects 
cause the noise to increase sharply, and that of the pixel size. Figure [8] shows the uncertainty in the k power spectrum given 
by this formula. Not shown in the figure is the f-space resolution which is A£ ~ ./sky. The errors in the power spectrum for £ 
separated by less then A£ will be correlated (see appendix |A] for more details). The cosmic variance (or sample variance for 
a partial sky survey) is likely to dominate the uncertainty over all linear scales. This illustrates a fundamental limitation of 
measuring cosmological parameters from the convergence power spectra and cross-power spectra. Decreasing the instrumental 
and/or irreducible noise does not provide any further information about the ensemble power spectrum of k although it does 
provide more information on the particular realization that we live in. Modes with £ < 10* will be cosmic variance limited 
if CT^(O)(50 < 0.017 arcmin for sources aX z — 10. For modes with I < 10'' the same is true if ct^(0)(5S < 0.12 arcmin. When 
estimating cosmological parameters there is no reason to decrease the noise below these values, as long as the Gaussian 
assumption holds and one is only interested in these modes. Nevertheless, this does not mean that the cosmic variance limit 
on the 3D power spectrum has been reached. More information can be gained by splitting the source redshift range up. This 
increases the noise for each subrange, but accesses the additional tomographic information that is averaged out when using 
the full redshift range to make a single shear map. 

To proceed we must choose a cosmological parameter space for exploration, a fiducial model to perturb around, and 
observational parameters for a set of representative surveys. For simplicity and for ease of comparison we follow the galaxy 
survey parameters chosen by Amara & Refregier (2006). In the current standard paradigm, the apparently accelerating 
expansion of the present universe is driven by dark energy, a near-uniform and dominant component of the cosmic energy 
density with effective equation of state p = wp where w < —1/3 (Riess et al. 2004; Astier et al. 2006; Spergel et al. 2006). 
Dark energy modifies the lensing signal due to cosmic structure in two ways. It affects the angular size distance to a given 
redshift, which is given by the expression 

K^) = ^ i^ir , (64) 



Ho J, E{z') 
where 

E{z) = [n,„(l + zf + (1 + 2)^(1+'") + (1 - - Q.a){1 + zf] ^'^ . (65) 

Here Q.a denotes the dark energy density today in units of the critical density, and w has been assumed to be constant. The 
second effect of dark energy results from its infiuence on the linear evolution of density fluctuations, which is given by 



dlna^ y dlna j dlna 2a^E{a) 

In addition to Qm, f^A and w, we include in our cosmological parameter set the logarithmic slope or spectral index of the 
primordial power spectrum, Us, the density of baryons, Qt, and the normalization of the power spectrum on large scales A, 
which is proportional to ag. The baryon oscillations in the power spectrum are not calculated, so Q,b only effects the overall 
shape. Note that we do not restrict ourselves to flat cosmologies, but we do fix the Hubble constant to Ho = 70 km/s/Mpc, 
assuming this to be externally determined. 

Figures [9] and [To] show predicted error ellipses for various pairs of our set of six cosmological parameters and for various 
combinations of idealized 21 cm and galaxy lensing surveys. Whenever calculations are done for 21 cm lensing at a particular 
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Figure 9. Predicted error ellipses {x^ = 2.2789 or 68% probability) for six cosmological parameters. The fiducial model is flrn = 0.3, 
S7a = 0.7, w = —1, ris = 1, Hi, = 0.031 and eg = 0.75. The Hubble constant is fixed at Ho = 70 km s~^ Mpc~^. All constraints are 
obtained by marginalizing over the 4 parameters not shown in each plot. The solid blue ellipses are for a full-sky galaxy survey with 
rig = 35 arcmin"^, o-j = 0.3 and a median redshift of 2 = 0.9. The dashed blue ellipses are for a deeper survey with rig = 100 arcmin"'^, 
(Tf = 0.3 and a median redshift of 2 = 1.23. It would take the SNAP satellite roughly 30 years to complete a full-sky survey to the deeper 
depth. In all cases the galaxies are divided into three redshift bins as described in the text. The solid green ellipses are for shear maps 
from 21 cm alone at redshifts z = 10 and 15. The dashed green ellipses are for the "optimal" 21 cm case with shear maps constructed 
for z = 10, 30 and 100. For these calculations we assume pixel radius S9 = 0.05 arcmin and noise level (7^(0) = 0.02, but the results 
are valid as long as aij {0)58 < 0.017 arcmin and 56 < 0.5 arcmin because in this case cosmic variance dominates the noise for all the £ 
values used. Modes i = 10 to i = 10* were used in deriving these constraints. The solid red ellipses are for the shallower galaxy survey 
combined with a 21 cm lensing survey at 2 = 10 and 15. Finally, the solid black ellipse shows the "optimum" combination of the deeper 
galaxy lensing survey with 21 cm shear maps for z = 10, 30 and 100. Figure [TOl shows blow-ups of these plots so that the inner regions 
can be seen better. The line types are summarized in Table [T] 
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Table 1. Expected standard errors, a X f^^_^ , in cosmological parameter estimates based on various lensing datasets. 
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redshift the convergence is treated as if it where constant over the redshift range used in estimating it. For each plot we 
have marginalized over the remaining four parameters of our model set. In Table [T] we give corresponding la uncertainties 
on individual parameters after marginalizing over the other five dimensions of our parameter space. The galaxy redshift 
distributions assumed here are the same as described at the end of section 14.21 When the galaxies are binned into several 
redshift intervals, we define these so as to obtain an equal number of galaxies in each bin. We also assume the full sky to be 
surveyed in all cases; for partial sky coverage the sizes of the uncertainties are approximately increased by a factor of f^'^- 
Apart from fixing the Hubble constant, no additional constraints from other observations are included. 
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Figure 10. Blow-ups of the plots in figure |9] 



In agreement with Amara & Refregier (2006) we find that a ambitious galaxy lensing survey could determine Ha and ris 
with an accuracy of about ~ 0.01, with an accuracy of about 0.004, f2,„ with an accuracy of about 0.0025 and w, with 
an accuracy of about 0.03. An ideal survey going to a depth corresponding to 100 source galaxies per square arcminute over 
the whole sky (requiring about 30 years with the specifications of the SNAP satellite) would reduce these uncertainties by 
about a factor of two. Surveys covering only a fraction /sky of the sky, would have uncertainties increased approximately in 
proportion to j^-^J"^ ■ 

While these numbers are impressive, shear maps derived from 21 cm alone can provide considerably tighter constraints. 
All-sky maps derived from the signal around 2: = 10 and z = 15 will be limited by cosmic variance if their resolution and 
noise properties satisfy a^{0)59 < 0.017 arcmin, and will then determine Qrn to an accuracy of 4 X 10"*, to an accuracy 
of 3.5 X 10~*, and w to an accuracy of 0.004. An ideal survey with source planes around z = 10, 30 and 100 would reduce 
these even further, with AQm ~ 10"", A^a ~ 0.0002, A^i, ~ 0.0005, and Aw ~ 0.001. 

Some parameter constraints are substantially improved by combining galaxy and 21 cm lensing, although most of the 
statistical weight comes from the latter. Thus combining the shallower galaxy survey considered above with the 21 cm survey 
at 2 = 10 and 15, one finds that SIa, A, Us and Sib are constrained about as well as by the 21 cm alone, while w is constrained 
almost six times better and fim four times better. Constraints on dark energy parameters are improved by including the 
galaxy lensing because dark energy primarily affects structure evolution &t z < 1. On the other hand, galaxy lensing alone 
gives comparatively poor constraints on these parameters unless a prior constraint on fim is included. For parameters that 
affect only the matter power spectrum (e.g. n^) 21 cm lensing has a larger comparative advantage. Of course it is not a 
question of one or the other. Clearly it is worth doing both galaxy and 21 cm lensing surveys to maximize the information 
gained and to spread the risk from unanticipated systematics. 

It should be emphasized that this analysis does not exhaust the potential for constraining cosmological parameters using 
21 cm or galaxy lensing. The dark energy model used here is overly simplified and may be unrealistic; some more physically 
based models imply appreciable effects at redshifts well beyond unity and so may be particularly well constrained by 21 cm 
surveys (Caldwell et al. 2003; Doran, Schwindt & Wetterich 2001). Other datasets, notably CMB observations and supernova 
surveys, constrain cosmological parameters in different ways than gravitational lensing, and will be much improved by the 
time surveys of the type discussed in this section are completed. Combining results from all these sources will give stringent 
tests for the presence of systematics and will provide tighter and more robust final constraints if overall consistency is found. 
Our knowledge of many cosmological parameters is limited by degeneracies which are drastically reduced when different types 
of observation are combined in this way. 
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6 OBSERVATIONS 

So far we have considered idealized observations where the irreducible noise dominates and the bandwidth is smaller than the 
intrinsic correlation length of the brightness temperature. This will be the best any experiment can do and, as we showed, 
will be reached when the noise in the temperature map is small compared to the temperature fluctuations in each frequency 
channel. This irreducible noise depends only on the shape of the temperature correlation function. Realistic observations, at 
least in the near future, will have foreground noise levels that are comparable to or larger than the intrinsic fluctuations in 
the brightness temperature. In this case the noise in the lensing map will depend more sensitively on the parameters of the 
telescope and on the level and statistical properties of the brightness temperature fluctuations. We now discuss these factors 
in more detail. 

The observations will be carried out with radio interferometers and thus in visibility space. As a result, when calculating 
the performance of telescopes it is easier to work in Fourier-space. For this section we adopt the formalism of appendix [Cl for 
convenience. Equations (|C7|I and HCIOP give the noise in the k estimate as a function of the power spectrum of foreground 
noise, {v), and the power spectrum of the brightness temperature, Ctiy). 

The noise in each visibility will have a thermal component and a component from imperfect foreground subtraction. We 
will model only the thermal component. If the telescopes in the array are uniformly distributed on the ground the average 
integration time for each baseline will be the same and the power spectrum of the noise will be 

^ 27r / TsysA \ " ^ (27r)^rsys .g^^ 

(Zaldarriaga et al. 2004; Morales 2005; McQuinn et al. 2006) where Tsya is the system temperature, Av is the bandwidth, to 
is the total observation time, Dtei is the diameter of the array and £max(A) = 2-nDtc\/\ is the highest multipole that can be 
measure by the array as set by the largest baselines, /cover is the total collecting area of the telescopes divided by 7r(_Dtci/2)^, 
the covering fraction. Other telescope conflgurations are possible which would make the noise unequally distributed in I, but 
we will consider only this uniform conflguration here. For our calculations we will use Tsys = 200 K. 

There are several relevant telescopes proposed or under construction. The 21 Centimeter Array (21CMA, formerly known 
as PAST) has /cover ~ 0.1 and ^max ~ 10^ giving it a resolution of about 10 arcmin. The Mileura Widefield Array (MWA) Low 
Frequency Demonstrator (lfdH will operate in the 80-300 MHz range with Dtoi — 1.5 km and /cover ~ 0.1. For LOFAR (Low 
Frequency Array jj] the core array will have /cover ~ 0.016 and Dtd ~ 2 km. LOFAR's extended baselines, out to 350 km and 
possibly larger, are not expected to be useful for high redshift 21 cm observations because of the small /cover of the extended 
array, although they will be used in foreground subtraction. It is anticipated that it will be able to detect 21 cm emission 
out to a redshift of 2: ~ 11.5, but sensitivity limitations will make mapping very difflcult. Plans for SKA (Square Kilometer 
Array jfl have not been finalized, but it is expected to have /cover ~ 0.02 out to a diameter of ~ 6 km (£max ~ 10^) and sparse 
coverage extending out to 1,000-3,000 km. The lowest frequency currently anticipated is ~ 100 MHz which corresponds to 
z ~ 13. It is anticipated that the core will be able to map the 21 cm emission with a resolution of 59 = Ad/2 ~ 0.5 arcmin. 
For reference what we call the pixel-width is given by A9 — 259 ~ vr/^rnax or 1.08 x 10*/^max arcmin. One arcminute (fwhm) 
corresponds to baselines of 7.9 km and 73 km at redshifts of 10 and 100 respectively. For our calculation we will concentrate 
only on an SKA-like array with Dtei = 6 km and a redshift range out to 2: = 13 since the smaller planned telescopes will not 
be capable of mapping mass at high fidelity. 

The fluctuations in the brightness temperature depends on the spin temperature, the ionization and the density of HI 
through 



m-24(l + Mx-Hf^^^^) f^i + iV'' mK 



(68) 



(Field 1959; Madau, Meiksin & Rees 1997). As is commoifly done, we will assume that the spin temperature is much greater 
than the CMB temperature. This leaves fluctuations in the ionization fraction, xh, and the baryon density 5t — {pb — 'Pb)/'Pb 
as the sources of fluctuations. We will make the simplifying assumption that xu = 1 until the universe is very rapidly and 
uniformly reionized. Realistically, the reionization process will be inhomogeneous and may extend over a significant redshift 
range. This will increase Ci{u) by perhaps a factor of 10 on scales larger the characteristic size of the ionized bubbles 
and thus might be expected to reduce the noise in k significantly. However, we have derived the noise in the lensing map 
by approximating the fourth order statistics of 5Tb as they would be for a Gaussian random field. If this is still a good 
approximation the lensing noise will be reduced. This is uncertain, however, since during reionization the field will clearly not 
be Gaussian, especially when the neutral fraction is low. A definitive resolution of these uncertainties will not be available 
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Figure 11. The signal-to-noise, (Tk((5©, A^)/(Tre((5©), ratio for an SKA-like observation as a function of bandwidth and patch size. In 
all cases the contours are 1, 2, 4 and 8(7 (when visible). The solid contours are for a covering fraction of /cover = 0.025 and the dashed 
contours are for /cover = 0.018. The dotted contours are the signal-to-noise ratio for brightness temperature with the same telescope 
parameters and /cover = 0.018. The other telescope parameters are Tsys = 200 K, Djci = 6 km, to = 90 days and the universe is assumed 
to be neutral in the redshift range 2 = 7 to 13 in all cases. 

until the observations are done. Here we model fluctuations in the baryons in the same way as in section 13.21 with linear 
structure formation and redshift distortions. 

Figure [TT] shows the signal-to-noise ratio, defined as (7s((50, Ai^)/(7k((50), for our SKA-like telescope. For the assumptions 
taken here the telescope should be able to make images {2a) of the dark matter on 1.3 to 2.5 arcmin scales in 90 days 
(/cover ~ 0.018 to 0.025). ThesB values are not too far from the optimal values and increasing the telescope's covering fraction 
or resolution would markedly improve upon these. 

Unlike in the irreducible noise only case shown in figure [G] when thermal noise is added the noise in R does not go 
to an asymptotic value at small SQ. This is because the noise increases very rapidly near the maximum resolution of the 
telescope because of a combination of effects. First the intrinsic temperature power spectrum goes down ai £ ^ 1000. Ce{i') 
is also suppressed by a factor of ~ 1/Az/ when £ > D{z)/5r where Sr is physical width corresponding to the bandwidth. In 
addition, for a fixed baseline the highest resolution is attained for only a limited range of frequency which limits the number 
of redshift bins. With the parameters adopted, the cross-correlation in the temperature between frequency channels never 
becomes important because the noise generally dominates when Av is small and it is assumed to have no cross-correlation 
between channels. 

As can be seen from figure [TT] there is an optimal bandwidth for measuring lensing. This comes about because at large 
bandwidths the number of independent frequency bins is limited. At small bandwidth the the signal-to-noise goes down 
because goes up faster than Ce with decreasing Au for scales £ < D{z)/5r. This optimal bandwidth is ~ 0.05 MHz for 
our examples. If there is more structure on smaller scales, such as when there are ionized bubbles, the optimal frequency will 
decrease. 

The optimal bandwidth for lensing is generally smaller than the optimal bandwidth for measuring the brightness tem- 
perature itself as can also be seen in figure [TT] At the optimal bandwidth the lensing map can have good fidelity while the 
temperature map is noise dominated on the same scale. This is a somewhat counter-intuitive situation which reflects the fact 
that it is better to get more independent redshift slices at low signal-to-noise than to image the temperature in fewer channels. 
With a wider bandwidth the temperature can be imaged on the same angular scale as the mass distribution. This indicates 
that it may be advantageous to use several bandwidths simultaneously. 

There are many additional challenges to observing 21 cm radiation from high redshift. The galactic foreground from 
synchrotron emission is about four orders of magnitude brighter than the 21 cm signal at ~ 180 MHz and goes up with 
decreasing frequency as v~^'^ . Both this emission and extragalactic foreground sources can, however, be cleaned from the 
data because they are much smoother in frequency space (and also in position on the sky for the Galactic foreground) than 
the 21 cm signal itself. At large frequency separations foreground emission may also decorrelate. Generally, foregrounds pose 
no more of a problem for mass-mapping than for direct mapping of the 21 cm itself. Rapid increases in foreground emission 
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and in the refractive index of the ionosphere with decreasing frequency make observations at higher redshift progressively 
more difficult. The high-redshift 21 cm absorption {z ^ 30) will be very difficult to observe and there are no mature plans to 
do so at this time. The ionosphere is opaque below ~ 10 MHz or z ^ 150, so in principle all lower redshifts are accessible from 
the ground. In practice, the high, time-dependent index of refraction will make it difficult to go beyond ~ 60 MHz without 
major advances in telescope technology. The ultimate high redshift 21 cm telescope would be located on the far side of the 
Moon where the absence of terrestrial interference or an ionosphere would allow access to higher redshifts. However, the large 
collecting area required would make this both technically challenging and expensive. 

Much will depend on future instrument design and the as yet unknown characteristics of the 21 cm absorption/emission, 
particularly around the epoch of reionization. Despite this, the planned specifications for SKA may enable it to make high- 
fidelity maps of the matter distribution and if enough area can be surveyed very good statistical information should be 
accessible. Realistic upgrades to the collecting area and array size would greatly improve its ability to make mass maps. 



7 CONCLUSION 

We have shown that when low-frequency radio telescopes become sufficiently powerful to map the signal from high-redshift 
21 cm emission/absorption within a bandwidth of ~ 0.05 MHz, the data will necessarily be good enough to map the grav- 
itational shear due to foreground matter. Increasing the resolution of the telescope reduces the intrinsic noise in the shear 
map both because of the number of statistically independent redshift slices increases and because the number of independent 
patches on the sky increases. As a result, 21 cm lensing offers the potential of producing high resolution, high signal-to-noise 
images of the cosmic mass distribution. Such images would be of enormous value for the study of cosmology and galaxy 
formation. 

For the specific problem of estimating cosmological parameters the requirements on resolution and redshift range are not 
particularly demanding, but survey area is of great importance. Even for a full-sky survey with a pixel of radius 59 = 1 arcmin 
(2 arcmin fwhm) and 10% noise per pixel, the shear power spectrum would be cosmic variance dominated up to ^ ~ 10^. 
The cosmic variance limit is probably achievable up to ^ ~ 10^ with an array of ~ 5 km in diameter and a covering factor 
of several percent. Cross-correlating several redshift slices with each other and with galaxy lensing surveys over a significant 
portion of the sky would begin a new era of very high precision cosmology. 

The study of structure formation would benefit particularly from higher resolution observations, however. If a resolution 
of ~ 6 arcsec (fwhm) could be achieved, every halo more massive than the Milky Way's would be clearly visible back to z ~ 10. 
Even with a resolution of ~ 1 arcmin (fwhm) all the halos ^ 2 x 10^^ M© should be individually detected. Connecting these 
mass maps to images of emission at other wavelengths would provide a tremendous wealth of information about the evolution 
of structure and the formation of galaxies. 

RBM would like to thank B. Ciardi, P. Madau and H. Sandvik for very useful discussions. We would also like to thank U. Seljak 
and O. Zahn for very useful comments. 
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APPENDIX A: PARAMETER ESTIMATION AND VISIBILITY SPACE 

The observations of 21 cm emission/absorption will be done with radio interferometers so it is appropriate to make an explicit 
connection between the observables from such an instrument and the quantities used in this paper. The flat sky approximation 
greatly simplifies the mathematics and is well justified for the angular scales of interest here. A radio interferometer measures 
the visibility, V{u), which in our case is related to the spin temperature by 

V{u) = [ de A(0)r((6>)e""" '' (Al) 

d'^wA{-w)f{u - w) (A2) 

where A{6) is the primary beam of the telescopes which is typically normalized to one at its peak {0 — 0) which gives its 
Fourier transform, A{u), a normalization of one in u-space. Units of temperature are used here (flux density units have a 
factor of 2kB/\^)- Tildes signify Fourier transforms. The size of the primary beam dictates the area covered on the sky in one 
"pointing" of the telescopes. The separation of the antennas and the position on the sky dictate u. 
The correlation in the visibilities is given by 

CTj = {V*{u,)V(u,)) (A3) 



d^w A*{u, - w)A{uj - w)5'(w) + (5,,C™ (A4) 

~ S{u,)W^,+S,,C^ (A5) 

where 5(u) is the intrinsic (cross-)power spectrum of the temperature and is the noise. It has been assumed that the 
temperature field is isotropic. Equation (IA5|I defines the window, Wij{w), that makes the visibilities correlated. This in effect 
defines the resolution in u-space. Correlations will only exist when Au = |ui — Uj | is less than twice the width of ^(u), i.e. the 
smaller the telescopes are the narrower A{u) will be and the higher the resolution. This width will be denoted cr„(u). It has 
been assumed in equation ()A5P that the intrinsic power spectrum does not change very much on the scale of cr„. The coordinate 
u is conjugate to the angle on the sky so the resolution is linked to the area covered on the sky by /sky — (27rati)~^. 
The visibility power spectrum can be related to the spherical harmonic power spectrum through 

u^S{u) = ^y"{2£ + l)CeJ2e+i{i7vu) (A6) 



72^^' 



(A7) 



where the second approximation is very good for ^ ^ 60 (White et al. 1999). If there were just one visibility measured the 
Fisher matrix would be 

F(u),6 = itr {{Cl)-^Cl,a{Clr^Cl,^\ (A8) 
(see Tegmark, Taylor & Heavens 1997, for a review of likelihood methods in astronomy). Visibilities within ~ ctu of each 
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other will be correlated, but an estimate of the total Fisher matrix can be made by assuming one measurement per correlated 
region which gives 

Fai = ^F(u)„b^ f ^F{u)ai (A9) 

u J " 

- (^EF(^ = 2-i"!U (AlO) 

tm 

^ fsky^{2£+l)F{£ = 27Vu)ab. (All) 

I 

This is the formula (|62|l in section [5] used to calculate cosmological parameter constraints. It can be seen that the /sky 
factor comes from the correlations or resolution in visibility space. A more sophisticated treatment would allow for partially 
correlated visibilities within au of each other which would reduce the noise further. The Fisher matrix is an estimate of the 
expected inverse correlation matrix in the model parameters at the maximum likelihood solution. Thus (f~^) is an estimate 
of the error in the parameter a after marginalizing over the other parameters. This formalism as outlined here in terms of the 
temperature, but it is equally valid for the lensing shear or convergence. 

The size of individual antennas in future radio telescope arrays are expected to be of order a few wavelengths or smaller 
as in the case of dipole antennas. In this case the primary beam covers almost the whole hemisphere. However, subtracting 
interference and handling the huge data rate will probably require synthesizing a much smaller beam. In addition, the 
subtraction of galactic foregrounds will probably not be possible in some regions near the galactic plane. For these reasons 
the sky fraction, /sky, and the shape of the observed fields will be limited for a single pointing of the telescope beam. The sky 
fraction and thus the £-space resolution can be increased with multiple pointing or mosaicking. 



APPENDIX B: LENSING IN FOURIER SPACE 

The temperature on the sky is to first order 

T{e,u)=T{e,iy) + V${0)-VT{0,u) (Bl) 

where T{d, i^) is the temperature before lensing and <I>(L) is the lensing potential defined so that the deflection angle is 
a{0) = V<E>(0). The Fourier transform of this is 

T{£, v) = T(£, r.)- J 0^ £'.(£- £')<i>(£')Ti£ - i' , u) (B2) 
and, as a result, to first order 

{T{£,jy)T*{£~L,u')) = [L ■ £Ce{u,u') + L ■ {L ^ £) qt.L\{u,u')] <I>(L) (B3) 

where L 7^ and the average is over realizations of the temperature field while keeping the lensing potential fixed (Hu & 
Okamoto 2002). Note that if the noise is homogeneous it will drop out of this equation and the Ct's will be only the power 
spectrum of the fluctuations in brightness temperature. 

The observed temperature is always binned into frequency channels or bands and smoothed by the telescope's beam. 
This observed temperature is 

T{£,,y) = J dr'q4r',Ai^) J (fl'A(£' -£)T{£',u') (B4) 

where the g^(r', Az/) is the response function for the band centered on v, T21 is the 3D Fourier transform of the brightness 
temperature and 

q^{k,u) = f dr'q^{r' ,Au)e''"' (B6) 

/ kSr{v, 
\ ^ 

with 



5r{u, Au) = ^u + Au/2 - ^u-Au/2) . (B8) 
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In (|B7|) the response function is taken to be a boxcar shape with sharp edges at + Aiy/2 and v — Ai'/2. In (|B8[) the fact 
that the universe is matter dominated at the time of the 21 cm emission/absorption is used to express the radial distances in 
terms of frequency. 

As a result of beam smearing relation (|B3|I will become 

(T{£,u)t''{£-L,u')) = {27vf J d^£'A{£')A*{£' + L)Ce+i,, (B9) 

'e"^{£") J d'e'A{£')A'- {£' - £" + L) { {£" ■ {£ + £'jCt+e, {v, v') - £" ■{£ + £'- £")Ci,+i,_e' i^, i^')) } (BIO) 
where the correlations between temperature modes before lensing and beam smearing 



Cl{v,v') = —^ldkP2AJ--^ + k^,z{u)\qAk,u)q^{k,u') (Bll) 



D{vf J \\l D{v 

It is assumed that D{u) does not change significantly between frequencies that are significantly correlated. A more rigorous 
derivation of (|B1H) is in spherical harmonic space as has been done by Zaldarriaga et al. (2004) , but for the scales of important 
here the difference is very small and HB11[I is considerably easier to evaluate. 

There are two effects that make (|B9)) and (fBTO| different from the Hu & Okamoto (2002) result (|B3)). The first term (fB9|l 
represents an aliasing effect caused by the finite size of the beam. This will cause a false signal on scales approaching the size 
of the beam or surveyed region, L ^ 27rcr„, that will need to be subtracted. The second term (|B10|I is a kind of smoothing of 
the lensing potential over a scale of ~ 27r(j„. In the limit of a very narrow beam, a large area in angle, the relation HB3|) is 
recovered except with the frequency binned power spectra. Thus the observations will really measure a lensing potential that 
is smoothed in Fourier-space in a rather complicated fashion. 



APPENDIX C: CONVERGENCE ESTIMATORS 

In the main text of this paper we used a real-space estimators for the shear and convergence, yi{0). We consider this the most 
intuitive and instructive approach. In the weak lensing limit the shear map can be converted to a convergence map because 
they are both related to a single lensing potential by differential operators. This is commonly done for galaxy lensing surveys 
(see Bartelmann & Schneider (2001) for a review). The most straightforward method is to Fourier transform the shear map, 
multiply by £ dependent factors and then transform back to a convergence map (Kaiser & Squires 1993). Averaging this with 
the 73 map would produce a convergence map with less noise than the 73 map alone. However, the gain in noise will not be 
as great as in the galaxy lensing case because in the 21 cm lensing case the Fourier modes of 71 and 72 are correlated unlike 
in the galaxy case. Probably a more practical approach from a technical point of view is to go directly from visibility space 
to a convergence map in real-space. 

Many convergence estimators in visibility or Fourier space are possible. Our real-space estimators can be Fourier trans- 
formed to make a set of estimators for the Fourier modes of shear and convergence, but the Fourier estimator of Hu & Okamoto 
(2002) has the advantage of having the lowest noise level for one frequency bin if the temperature distribution is Gaussian 
and the beam is infinitely large (in angle). Zahn & Zaldarriaga (2006) find an estimator in both angular Fourier-space and 
frequency Fourier-space which is optimal with the added assumption that the frequency Fourier modes are statistically inde- 
pendent and Gaussian distributed. The statistical independence of the modes will break down because of binning in frequency 
and to a lesser extent because of the finite range in frequency. For this reason it is difficult to determine how bandwidth will 
affect the noise in their estimator. Instead we choose to use the Hu & Okamoto (2002) estimator for each frequency band and 
then weight each band. 

We consider a second-order estimator for the shear or convergence of the form 



7,(L) = j d''£V,{£,'L,u)T(£,v)T*(£-l.,v). (CI) 

V 

where, as in the main text, 71,2 are estimators for the two components of shear and 73 is an estimator for the convergence. In 
this case the estimators are of the form 

r,(^, L, v) = A(L)c^(i., V)x{£, L, v) (C2) 
where 

A(L) = { 2LxL2 (C3) 
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The function L, ly) is proportional to the Hu & Okamoto (2002) estimator 



(C4) 



where (u) is the power spectrum of the actual temperature while Ci{iy) = cj {u) + Ci' {u) is the observed power spectrum 
which includes noise and tj(!/, L) is a weight that is to be determined. 

In the limit of an infinitely large beam the correlation between modes is 



(7,(L)7;(L')) = 2(2^)^5 (L-L')A(L)75,(L)^^^Ma;(^')y 
= {27vfS{L-L')D,{L)D,{L)N;'{L) 



d^e X {£, L) X {£, L) C, {u, v')C\,,^ {y, v') (C5) 



(C6) 



It has been assumed here that the temperature is Gaussian distributed so that the fourth moment can be written as products 
of second moments. The finite beam will cause correlations in the noise {7i(L)7i(L + (5L)) 7^ when 5L ^ 27ro-u. The expression 
for these correlations is lengthy, but easily worked out. In general there will be a correlation between the modes for different 
components of shear, unlike in real-space. An image can be constructed by Fourier transforming (IClfl to real-space with a 
smoothing window. The noise at a point on this image will be 



(2^)2 



Ty(L,5e) A(L)^Ar^(L) 



(C7) 



where W{h,&'S) is the Fourier transform of the smoothing function. This implies that a\(l>&) = 2cti(50) — 2a\{&Q) in 
contrast to the noise in the real-space estimators. 

As in the real-space version, the optimal frequency weights, io(i>), are complicated in general, but they simplify if we 
make the approximation that each frequency bin is statistically independent. If we minimize the diagonal entries of (|C5[l while 
requiring that the average of (|C1|I with (|B3P reproduce the shears and the convergence, we find 
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These weights can be reinserted in expression (|C5p . but now allowing for correlations between frequencies, 
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j2/ 



(2^)2 



d^£ x{i, L, z.)x(^, L, :y')Ct {v, v')C\,^^ (v, v') 



This can be rewritten in the suggestive form 
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where 



Nf{L) 



Y^J dH x{(-M''c'^i{v)cl^L\{y) 

(y2 - v\) 



(C8) 



(C9) 



(CIO) 



(CU) 
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which is essentially the Zahn & Zaldarriaga (2006) estimator except for the / N"^*^ {L) factor. N^^{L) is the effective number 
of independent frequency bins. Line (|C12|I is an alternative definition of the frequency correlation length. If the frequency bins 
are uncorrelated N°^{L) = and Aiy^ = Ai/, but if there are correlations between bins Nf{L) < and Aul > Av. The 
irreducible noise limit discussed in section |3] corresponds to the case where C^(i/) = Ci (v) and to infinitely narrow frequency 
bins. 

In actual data the temperature distribution will not be Gaussian, the foregrounds will not be perfectly subtracted which 
will produce spurious correlations in frequency, there will be holes in the surveyed area caused by point source subtraction, 
there will be a finite and irregular beam, and the coverage of u-v plane will not be complete. All these complications making 
it unclear at this time what estimator will be the best choice for real data. 



